Data Placement for Efficient Main Memory Access
نویسندگان
چکیده
The main memory system is a critical component of modern computer systems. Dynamic Random Access Memory (DRAM) based memory designs dominate the industry due to mature device technology and low cost. These designs, however, face several challenges moving forward. These challenges arise due to legacy DRAM device design choices, advances in Central Processing Unit (CPU) design, and the demand for higher memory throughput and capacity from applications. Due to the cost-sensitive nature of the DRAM industry, changes to the device architecture face significant challenges for adoption. There is thus a need to improve memory system designs, ideally without changing the DRAM device architectures. This dissertation addresses the challenges faced by DRAM memory systems by leveraging data management. Historically, data management/placement and its interaction with the memory’s hardware characteristics have been abstracted away at the system software level. In this dissertation, we describe mechanisms that leverage data placement at the operating system level to improve memory access latency, power/energy efficiency, and capacity. An important advantage of using these schemes is that they require no changes to the DRAM devices and only minor changes to the memory controller hardware. The majority of the changes are limited to the operating system. This thesis also explores data management mechanisms for future memory systems built using new 3D stacked DRAM devices and point-to-point interconnects. Using the schemes described here, we show improvements in various DRAM metrics. We improve DRAM row-buffer hit rates by co-locating parts of different Operating System (OS) pages in the same row-buffer. This improves performance by 9% and reduces energy consumption by 15%. We also improve page placement to increase opportunities for power-down. This enables a three-fold increase in memory capacity for a given memory power budget. We also show that page placement is an important ingredient in building efficient networks of memories with 3D-stacked memory devices. We report a performance improvement of 49% and an energy reduction of 42% with a design that optimizes page placement and network topology.
منابع مشابه
Scaling Up Concurrent Main-Memory Column-Store Scans: Towards Adaptive NUMA-aware Data and Task Placement
Main-memory column-stores are called to efficiently use modern non-uniform memory access (NUMA) architectures to service concurrent clients on big data. The efficient usage of NUMA architectures depends on the data placement and scheduling strategy of the column-store. Most column-stores choose a static strategy that involves partitioning all data across the NUMA architecture, and employing a s...
متن کاملThe Architecture of Direct Data Placement (DDP) and Remote Direct Memory Access (RDMA) on Internet Protocols
This document defines an abstract architecture for Direct Data Placement (DDP) and Remote Direct Memory Access (RDMA) protocols to run on Internet Protocol-suite transports. This architecture does not necessarily reflect the proper way to implement such protocols, but is, rather, a descriptive tool for defining and understanding the protocols. DDP allows the efficient placement of data into buf...
متن کاملA Study of Page Placement and Migration in Heterogeneous Flat-Addressable Memories
The volume of data generated by research, commercial, industrial, communication, entertainment and other fields is growing exponentially. There is a need for faster and very large amounts of main memory for analyzing such volumes of data in reasonable amounts of time. In addition, these systems need to be as energy efficient as possible, since the energy requirements of most high-performance co...
متن کاملAdaptive NUMA-aware data placement and task scheduling for analytical workloads in main-memory column-stores
Non-uniform memory access (NUMA) architectures pose numerous performance challenges for main-memory column-stores in scaling up analytics on modern multi-socket multi-core servers. A NUMAaware execution engine needs a strategy for data placement and task scheduling that prefers fast local memory accesses over remote memory accesses, and avoids an imbalance of resource utilization, both CPU and ...
متن کاملA Framework for Monitoring Shared Memory Applications
The Performance of shared memory programs running on NUMA-characterized architectures significantly depends on the efficient use of the local memories. Most of such applications, however, initially do not show an efficient data placement resulting in poor performance and therefore require further tuning. This kind of tuning needs detailed information about an application’s memory access pattern...
متن کامل